-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(context): Add Prompt Caching to Code Context (CODY-4807) #6878
Conversation
...lin/lib/src/main/kotlin/com/sourcegraph/cody/agent/protocol_generated/Cache_controlParams.kt
Outdated
Show resolved
Hide resolved
This reverts commit 1aa1358.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking comments are 1) undo snapshot/recording changes by not enabling prompt caching in agent integration tests and 2) coming up with a strategy to measure the effectiveness of prompt caching.
Currently, in the diff, prompt caching is unconditionally enabled everywhere and we don't track the impact with telemetry events. My expectation is that we should be able to enable/disable this feature dynamically with feature flags and we should be able to answer questions like how much faster responses are with prompt caching (or what % of tokens are cached).
@@ -327,3 +332,12 @@ export class ClientConfigSingleton { | |||
return this.fetchConfigEndpoint(signal, config) | |||
} | |||
} | |||
// It's really complicated to access CodyClientConfig from functions like utils.ts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth getting team alignment on what the singleton story should look like for CodyClientConfig
. Currently, the interface is designed in a very poor way where it includes information from the server AND custom local settings. I think a cleaner solution is to separate the /.api/client-config
settings from locally inferred settings. Going forward, we should stop using the site version as a feature gating signal and exclusively use /.api/client-config
as you have done with api-version=7
in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocking comment, just want to call out that I'm not 100% happy with how the current code is organized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Can revisit this problem later.
I think we disable feature flags by default in agent integration tests, so if you gate caching with a feature flag, then the existing snapshot tests should pass without updating the recordings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stamping since I will be afk until Tuesday and we are targeting this for Wednesday. I recommend getting another approval on the PR once you have addressed the comments above.
Problem (why)
Currently the daily cost and token usage on models is very high. We want to find some ways to reduce them.
Solution (context)
Prompt caching can significantly reduce token costs. Each cache hit reduces costs by 90%, while each cache miss increases costs by 25%.
After some initial analysis, we decide to start with implementing prompt caching for Claude models.
Implementation (what)
cache_control: ephemeral
, which creates a cache with a 5 min TTL.Anthropic Docs (context)
Test plan